Corpus - Based Identi cation of Non - Anaphoric NounPhrasesDavid

نویسنده

  • David L. Bean
چکیده

Coreference resolution involves nding antecedents for anaphoric discourse entities, such as deenite noun phrases. But many deenite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., \the White House" or \the news media"). We have developed a corpus-based algorithm for automatically identifying deenite noun phrases that are non-anaphoric, which has the potential to improve the eeciency and accuracy of coreference resolution systems. Our algorithm generates lists of non-anaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MUC-4 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision at identifying such noun phrases in 50 test documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS

In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...

متن کامل

Automatic Term Identi cation and Classi cation in Biology Texts

The rapid growth of collections in online academic databases has meant that there is increasing di culty for experts who want to access information in a timely and e cient way. We seek here to explore the application of information extraction methods to the identi cation and classi cation of terms in biological abstracts from MEDLINE. We explore the use of a statistical method and a decision tr...

متن کامل

Ambiguity reduction in speaker identification by the relaxation labeling process

A nonlinear probabilistic model of the relaxation labeling (RL) process is implemented in the speaker identi"cation task in order to disambiguate the labeling of the speech feature vectors. In this proposed algorithm, the deterministic labeling of the vector quantization (VQ)-based speaker identi"cation is relaxed by means of introducing initial probabilistic weights to the labeling process of ...

متن کامل

Topic recognition for news speech based on keyword spotting

This paper describes topic identi cation for Japanese TV news speech based on the keyword spotting technique. Three thousands of nouns are selected as keywords which contribute to topic identi cation, based on criterion of mutual information and a length of the word. This set of the keywords identi ed the correct topic for 76.3% of articles from newspaper text data. Further, we performed keywor...

متن کامل

Corpus-Based Identification of Non-Anaphoric Noun Phrases

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999